To stop Tensorflow backend, which might cause complications

In [1]:
import os
import sys
stderr = sys.stderr
sys.stderr = open(os.devnull, 'w')
import keras
sys.stderr = stderr

Import the required libraries

Pandas Datareader reads the ticker and fetches data from Yahoo Finance

Datetime helps in converting dates into required formats

In [2]:
import math
import pandas_datareader as web
import numpy as np
import pandas as pd
import datetime as dt
from datetime import datetime
import math
from sklearn.metrics import mean_squared_error

Import all the libraries required for LSTM Building

Sequential defines the kind of model you shall build

You can add different layers to fine tune your prediction

Optimizers are used to compile the model

Callbacks are used to stop/modify the learning process whenever required

In [3]:
from subprocess import check_output
from keras.models import Sequential
from keras.layers import Dense,LSTM, Dropout, GRU, Bidirectional
from keras.optimizers import Adam, SGD
from keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint

Get the stock data by entering the Ticker and the range you want

In [4]:
ticker = input('Enter the ticker:')
start_date = input('Enter start date in YY-MM-DD:')
end_date = input('Enter date in YY-MM-DD:')
df = web.DataReader(ticker, data_source='yahoo',start= start_date,end= end_date)
df
Enter the ticker:AMZN
Enter start date in YY-MM-DD:2015-01-01
Enter date in YY-MM-DD:2020-12-01
Out[4]:
High Low Open Close Volume Adj Close
Date
2015-01-02 314.750000 306.959991 312.579987 308.519989 2783200 308.519989
2015-01-05 308.380005 300.850006 307.010010 302.190002 2774200 302.190002
2015-01-06 303.000000 292.380005 302.239990 295.290009 3519000 295.290009
2015-01-07 301.279999 295.329987 297.500000 298.420013 2640300 298.420013
2015-01-08 303.140015 296.109985 300.320007 300.459991 3088400 300.459991
... ... ... ... ... ... ...
2020-11-24 3134.250000 3086.260010 3100.500000 3118.060059 3602100 3118.060059
2020-11-25 3198.000000 3140.260010 3141.870117 3185.070068 3790400 3185.070068
2020-11-27 3216.189941 3190.050049 3211.260010 3195.340088 2392900 3195.340088
2020-11-30 3228.389893 3125.550049 3208.479980 3168.040039 4063900 3168.040039
2020-12-01 3248.949951 3157.179932 3188.500000 3220.080078 4544400 3220.080078

1490 rows × 6 columns

As visualizing requires indexed data, we reset the index and store it in df_plot

In [5]:
df_plot = df.reset_index(inplace = False)
df_plot
Out[5]:
Date High Low Open Close Volume Adj Close
0 2015-01-02 314.750000 306.959991 312.579987 308.519989 2783200 308.519989
1 2015-01-05 308.380005 300.850006 307.010010 302.190002 2774200 302.190002
2 2015-01-06 303.000000 292.380005 302.239990 295.290009 3519000 295.290009
3 2015-01-07 301.279999 295.329987 297.500000 298.420013 2640300 298.420013
4 2015-01-08 303.140015 296.109985 300.320007 300.459991 3088400 300.459991
... ... ... ... ... ... ... ...
1485 2020-11-24 3134.250000 3086.260010 3100.500000 3118.060059 3602100 3118.060059
1486 2020-11-25 3198.000000 3140.260010 3141.870117 3185.070068 3790400 3185.070068
1487 2020-11-27 3216.189941 3190.050049 3211.260010 3195.340088 2392900 3195.340088
1488 2020-11-30 3228.389893 3125.550049 3208.479980 3168.040039 4063900 3168.040039
1489 2020-12-01 3248.949951 3157.179932 3188.500000 3220.080078 4544400 3220.080078

1490 rows × 7 columns

Create a separate list for Columns that we want to work on

In [6]:
cols = list(df_plot)[1:5]
cols
Out[6]:
['High', 'Low', 'Open', 'Close']

Create a list of Dates in our data

In [7]:
datelist_train = list(df_plot['Date'])
print('All timestamps == {}'.format(len(datelist_train)))
All timestamps == 1490

Import Plotly libraries - Visualization part begins

In [8]:
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots as ms

Visualize top 4 variables side by side

Update layout to change theme, add title and adjust height and width

Add a range slider to filter down to the levels you want

In [9]:
figure1 = ms(rows = 2, cols = 2, subplot_titles = ("Open", "Close", "Low", "High"))

figure1.add_trace(go.Line(x = df_plot['Date'], y = df['Open']), row = 1, col = 1)

figure1.add_trace(go.Line(x = df_plot['Date'], y = df['Close']), row = 1, col = 2)

figure1.add_trace(go.Line(x = df_plot['Date'], y = df['Low']), row = 2, col = 1)

figure1.add_trace(go.Line(x = df_plot['Date'], y = df['High']), row = 2, col = 2)

figure1.update_layout(height = 1000, width = 1500, title_text = 'Price Summary',
                      showlegend = False, template = 'presentation')

figure1.update_xaxes(rangeslider_visible = True, rangeselector = dict(buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")])))

figure1.show()
C:\Users\mayur\Anaconda3\lib\site-packages\plotly\graph_objs\_deprecations.py:385: DeprecationWarning:

plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Line
  - plotly.graph_objs.layout.shape.Line
  - etc.


Visualize Volume over the years

In [10]:
fig2 = px.line(df_plot, x = 'Date', y = 'Volume',
              title='Volume of Stocks', 
              template = 'presentation')

fig2.update_xaxes(rangeslider_visible = True, rangeselector = dict(buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")])))
fig2.show()

Visualize Open vs Close

In [11]:
fig3 = px.line(df_plot, x = 'Date', y = ['Open', 'Close'],
              title = 'Open vs Close', 
              template = 'presentation')

fig3.update_xaxes(rangeslider_visible = True, rangeselector = dict(buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")])))
fig3.show()

Visualize Low vs High

In [12]:
fig4 = px.line(df_plot, x = 'Date', y = ['Low', 'High'],
              title = 'Low vs High', 
              template = 'presentation')

fig4.update_xaxes(rangeslider_visible = True, rangeselector = dict(buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")])))
fig4.show()

Convert the Columns from df_plot to String format and store in df1

In [13]:
df1 = df_plot[cols].astype(str)

Turn df1 to Float and store in df2

In [14]:
df2 = df1.astype(float)
df2
Out[14]:
High Low Open Close
0 314.750000 306.959991 312.579987 308.519989
1 308.380005 300.850006 307.010010 302.190002
2 303.000000 292.380005 302.239990 295.290009
3 301.279999 295.329987 297.500000 298.420013
4 303.140015 296.109985 300.320007 300.459991
... ... ... ... ...
1485 3134.250000 3086.260010 3100.500000 3118.060059
1486 3198.000000 3140.260010 3141.870117 3185.070068
1487 3216.189941 3190.050049 3211.260010 3195.340088
1488 3228.389893 3125.550049 3208.479980 3168.040039
1489 3248.949951 3157.179932 3188.500000 3220.080078

1490 rows × 4 columns

Turn this df2 into a matrix for our Training Set

In [15]:
training_set = df2.as_matrix()
print('Shape of training set == {}.'.format(training_set.shape))
training_set
Shape of training set == (1490, 4).
C:\Users\mayur\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning:

Method .as_matrix will be removed in a future version. Use .values instead.

Out[15]:
array([[ 314.75      ,  306.95999146,  312.57998657,  308.51998901],
       [ 308.38000488,  300.8500061 ,  307.01000977,  302.19000244],
       [ 303.        ,  292.38000488,  302.23999023,  295.29000854],
       ...,
       [3216.18994141, 3190.05004883, 3211.26000977, 3195.34008789],
       [3228.38989258, 3125.55004883, 3208.47998047, 3168.04003906],
       [3248.94995117, 3157.17993164, 3188.5       , 3220.08007812]])

Use MinMaxScaler to scale the data.

It scales all the values in the training data to equivalent values between 0-1

In [16]:
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)

sc_predict = MinMaxScaler(feature_range=(0,1))
sc_predict.fit_transform(training_set[:,0:1])
Out[16]:
array([[0.0073464 ],
       [0.00539329],
       [0.00374372],
       ...,
       [0.89696024],
       [0.90070088],
       [0.90700482]])

Split the Training into X_train and y_train

n_future defines the days we want to predict into the Future

n_past determines how many former days we use to predict

In [17]:
X_train = []
y_train = []

n_future = 60   # Number of days we want top predict into the future
n_past = 90     # Number of past days we want to use to predict the future

for i in range(n_past, len(training_set_scaled) - n_future +1):
    X_train.append(training_set_scaled[i - n_past:i, 0:df2.shape[1] - 1])
    y_train.append(training_set_scaled[i + n_future - 1:i + n_future, 0])

X_train, y_train = np.array(X_train), np.array(y_train)

print('X_train shape == {}.'.format(X_train.shape))
print('y_train shape == {}.'.format(y_train.shape))
X_train shape == (1341, 90, 3).
y_train shape == (1341, 1).

Let's build the LSTM model

Define a Sequential model which is a linear stack of layers

return_sequences returns the hidden state output for each input time step

This has more Sequential layers and Dropout layers one after the other

Dense is a kind of layer that receives input from all the neurons in the previous layer

Dropout randomly sets input units to 0 to prevent Overfitting

The rmsprop optimizer and the Loss metric 'mse' is used for Compilation

In [18]:
import time
model = Sequential()
model.add(LSTM(units = 50, return_sequences=True, input_shape = (n_past, df2.shape[1]-1)))
model.add(Dropout(0.2))
model.add(LSTM(units = 50, return_sequences= True))
model.add(Dropout(0.2))
model.add(LSTM(units = 50, return_sequences= True))
model.add(Dropout(0.2))
model.add(LSTM(units = 50))
model.add(Dropout(0.2))
model.add(Dense(units = 1))
start = time.time()
model.compile(loss='mse', optimizer='rmsprop')
print('compilation time : ', time.time() - start)
compilation time :  0.018909454345703125

Specifying the Callbacks

EarlyStopping monitors a certain parameter to keep it Minimum or Maximum. Verbose prints the epoch number at which training is stopped. Patience is used to delay the trigger to stop the training at a certain epoch.

ReduceLROnPlateau reduces the Learning rate for the Validation Loss by halving the initial Learning rate

ModelCheckpoint saves the best trained model and stores it for further use

In [19]:
es = EarlyStopping(monitor='loss', min_delta=1e-10, patience=20, verbose=1)
rlr = ReduceLROnPlateau(monitor='loss', factor=0.5, patience=20, verbose=1) # Factor gives a New LR = LR * factor
mcp = ModelCheckpoint(filepath='weights.h5', monitor='loss', verbose=1, save_best_only=True, save_weights_only=True)

Fitting the Model with the callbacks, desired Batch Size and number of Epochs

In [44]:
ep = int(input('Enter the epochs you want: '))
bs = int(input('Enter the batch size you want: '))
history = model.fit(X_train, y_train, epochs= ep, batch_size= bs, callbacks = [es, rlr, mcp])
Enter the epochs you want: 50
Enter the batch size you want: 32
Epoch 1/50
1341/1341 [==============================] - 13s 10ms/step - loss: 0.0035

Epoch 00001: loss did not improve from 0.00314
Epoch 2/50
1341/1341 [==============================] - 20s 15ms/step - loss: 0.0033

Epoch 00002: loss did not improve from 0.00314
Epoch 3/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0036

Epoch 00003: loss did not improve from 0.00314
Epoch 4/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0033

Epoch 00004: loss did not improve from 0.00314
Epoch 5/50
1341/1341 [==============================] - 15s 11ms/step - loss: 0.0034

Epoch 00005: loss did not improve from 0.00314
Epoch 6/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0028

Epoch 00006: loss improved from 0.00314 to 0.00277, saving model to weights.h5
Epoch 7/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0028

Epoch 00007: loss did not improve from 0.00277
Epoch 8/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0027

Epoch 00008: loss improved from 0.00277 to 0.00271, saving model to weights.h5
Epoch 9/50
1341/1341 [==============================] - 17s 12ms/step - loss: 0.0027

Epoch 00009: loss improved from 0.00271 to 0.00266, saving model to weights.h5
Epoch 10/50
1341/1341 [==============================] - 18s 13ms/step - loss: 0.0029

Epoch 00010: loss did not improve from 0.00266
Epoch 11/50
1341/1341 [==============================] - 17s 12ms/step - loss: 0.0025

Epoch 00011: loss improved from 0.00266 to 0.00249, saving model to weights.h5
Epoch 12/50
1341/1341 [==============================] - 17s 12ms/step - loss: 0.0028

Epoch 00012: loss did not improve from 0.00249
Epoch 13/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0025

Epoch 00013: loss did not improve from 0.00249
Epoch 14/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0026

Epoch 00014: loss did not improve from 0.00249
Epoch 15/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0026

Epoch 00015: loss did not improve from 0.00249
Epoch 16/50
1341/1341 [==============================] - 17s 12ms/step - loss: 0.0026

Epoch 00016: loss did not improve from 0.00249
Epoch 17/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0026

Epoch 00017: loss did not improve from 0.00249
Epoch 18/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0027

Epoch 00018: loss did not improve from 0.00249
Epoch 19/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0022

Epoch 00019: loss improved from 0.00249 to 0.00225, saving model to weights.h5
Epoch 20/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0026

Epoch 00020: loss did not improve from 0.00225
Epoch 21/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0023

Epoch 00021: loss did not improve from 0.00225
Epoch 22/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0024

Epoch 00022: loss did not improve from 0.00225
Epoch 23/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0028

Epoch 00023: loss did not improve from 0.00225
Epoch 24/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0024

Epoch 00024: loss did not improve from 0.00225
Epoch 25/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0025

Epoch 00025: loss did not improve from 0.00225
Epoch 26/50
1341/1341 [==============================] - 17s 12ms/step - loss: 0.0022

Epoch 00026: loss improved from 0.00225 to 0.00222, saving model to weights.h5
Epoch 27/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0023

Epoch 00027: loss did not improve from 0.00222
Epoch 28/50
1341/1341 [==============================] - 17s 12ms/step - loss: 0.0023

Epoch 00028: loss did not improve from 0.00222
Epoch 29/50
1341/1341 [==============================] - 17s 12ms/step - loss: 0.0022

Epoch 00029: loss improved from 0.00222 to 0.00220, saving model to weights.h5
Epoch 30/50
1341/1341 [==============================] - 19s 14ms/step - loss: 0.0026

Epoch 00030: loss did not improve from 0.00220
Epoch 31/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0021

Epoch 00031: loss improved from 0.00220 to 0.00209, saving model to weights.h5
Epoch 32/50
1341/1341 [==============================] - 17s 12ms/step - loss: 0.0022

Epoch 00032: loss did not improve from 0.00209
Epoch 33/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0020

Epoch 00033: loss improved from 0.00209 to 0.00198, saving model to weights.h5
Epoch 34/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0023

Epoch 00034: loss did not improve from 0.00198
Epoch 35/50
1341/1341 [==============================] - 17s 12ms/step - loss: 0.0021

Epoch 00035: loss did not improve from 0.00198
Epoch 36/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0021

Epoch 00036: loss did not improve from 0.00198
Epoch 37/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0020

Epoch 00037: loss improved from 0.00198 to 0.00196, saving model to weights.h5
Epoch 38/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0020

Epoch 00038: loss did not improve from 0.00196
Epoch 39/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0021

Epoch 00039: loss did not improve from 0.00196
Epoch 40/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0020

Epoch 00040: loss did not improve from 0.00196
Epoch 41/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0019

Epoch 00041: loss improved from 0.00196 to 0.00188, saving model to weights.h5
Epoch 42/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0023

Epoch 00042: loss did not improve from 0.00188
Epoch 43/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0017

Epoch 00043: loss improved from 0.00188 to 0.00171, saving model to weights.h5
Epoch 44/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0020

Epoch 00044: loss did not improve from 0.00171
Epoch 45/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0018

Epoch 00045: loss did not improve from 0.00171
Epoch 46/50
1341/1341 [==============================] - 17s 13ms/step - loss: 0.0019

Epoch 00046: loss did not improve from 0.00171
Epoch 47/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0018

Epoch 00047: loss did not improve from 0.00171
Epoch 48/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0020

Epoch 00048: loss did not improve from 0.00171
Epoch 49/50
1341/1341 [==============================] - 16s 12ms/step - loss: 0.0019

Epoch 00049: loss did not improve from 0.00171
Epoch 50/50
1341/1341 [==============================] - 15s 12ms/step - loss: 0.0018

Epoch 00050: loss did not improve from 0.00171

Generate list of sequence of days for predictions

In [45]:
datelist_future = pd.date_range(datelist_train[-1], periods=n_future, freq='1d').tolist()

Convert Pandas Timestamp to Datetime object as they are FUTURE dates

In [46]:
datelist_future_ = []
for this_timestamp in datelist_future:
    datelist_future_.append(this_timestamp.date())

Perform predictions for Past and Future data

In [47]:
predictions_future = model.predict(X_train[-n_future:])

predictions_train = model.predict(X_train[n_past:])

Special function: convert Datetime to Timestamp

In [48]:
def datetime_to_timestamp(x):
    return datetime.strptime(x.strftime('%Y-%m-%d'), '%Y-%m-%d')

Inverse the scaled Predictions and choose the desired predicted column

Store the training and test predictions in two different DataFrames

In [49]:
y_pred_future = sc_predict.inverse_transform(predictions_future)
y_pred_train = sc_predict.inverse_transform(predictions_train)

cyw = input('Enter the name of the variable you want to predict: ')

PREDICTIONS_FUTURE = pd.DataFrame(y_pred_future, columns=[cyw]).set_index(pd.Series(datelist_future))
PREDICTION_TRAIN = pd.DataFrame(y_pred_train, columns=[cyw]).set_index(pd.Series(datelist_train[2 * n_past + n_future -1:]))
Enter the name of the variable you want to predict: Close

Convert Datetime to Timestamp for the index of PREDICTION_TRAIN

In [50]:
PREDICTION_TRAIN.index = PREDICTION_TRAIN.index.to_series().apply(datetime_to_timestamp)

PREDICTION_TRAIN # These are all the Predictions for our Training data
Out[50]:
Close
2015-12-14 581.946228
2015-12-15 582.613342
2015-12-16 583.383179
2015-12-17 584.185059
2015-12-18 584.898865
... ...
2020-11-24 3115.769043
2020-11-25 3113.068848
2020-11-27 3110.791016
2020-11-30 3109.412354
2020-12-01 3109.275391

1251 rows × 1 columns

These are predictions for the Future i.e 60 days ahead

In [51]:
PREDICTIONS_FUTURE
Out[51]:
Close
2020-12-01 3162.400146
2020-12-02 3136.250977
2020-12-03 3109.713867
2020-12-04 3084.322754
2020-12-05 3060.994629
2020-12-06 3039.977051
2020-12-07 3021.262695
2020-12-08 3005.684082
2020-12-09 2996.737061
2020-12-10 3001.921631
2020-12-11 3022.667480
2020-12-12 3047.787598
2020-12-13 3077.748291
2020-12-14 3120.347900
2020-12-15 3148.601807
2020-12-16 3152.804688
2020-12-17 3160.974854
2020-12-18 3170.069580
2020-12-19 3177.697998
2020-12-20 3184.259766
2020-12-21 3190.564453
2020-12-22 3197.131348
2020-12-23 3204.079834
2020-12-24 3211.031982
2020-12-25 3216.746826
2020-12-26 3220.248779
2020-12-27 3220.518066
2020-12-28 3218.250732
2020-12-29 3214.251465
2020-12-30 3207.844238
2020-12-31 3197.920654
2021-01-01 3165.805908
2021-01-02 3161.803223
2021-01-03 3114.893799
2021-01-04 3146.125000
2021-01-05 3158.926270
2021-01-06 3166.267578
2021-01-07 3169.946289
2021-01-08 3170.475098
2021-01-09 3167.761230
2021-01-10 3162.581787
2021-01-11 3156.709229
2021-01-12 3151.230469
2021-01-13 3146.636475
2021-01-14 3143.278076
2021-01-15 3140.618896
2021-01-16 3138.100342
2021-01-17 3135.709961
2021-01-18 3133.711670
2021-01-19 3131.561523
2021-01-20 3129.433350
2021-01-21 3126.926758
2021-01-22 3124.253662
2021-01-23 3121.412842
2021-01-24 3118.572754
2021-01-25 3115.769043
2021-01-26 3113.068848
2021-01-27 3110.791016
2021-01-28 3109.411865
2021-01-29 3109.274902

Create another Dataframe date_col with the required/predicted columns for original data

In [52]:
date_col = df.filter(['Date', cyw])
date_col
Out[52]:
Close
Date
2015-01-02 308.519989
2015-01-05 302.190002
2015-01-06 295.290009
2015-01-07 298.420013
2015-01-08 300.459991
... ...
2020-11-24 3118.060059
2020-11-25 3185.070068
2020-11-27 3195.340088
2020-11-30 3168.040039
2020-12-01 3220.080078

1490 rows × 1 columns

Create a final dataframe that concatenates required columns and predicted values. Use this for final visual.

In [53]:
fin_mod = pd.concat([date_col, PREDICTION_TRAIN, PREDICTIONS_FUTURE], axis = 1)
fin_mod2 = fin_mod.reset_index(inplace = False)
fin_mod2.columns = ['Date', 'Actual Price', 'Training Predictions', 'Future Predictions']
fin_mod2
Out[53]:
Date Actual Price Training Predictions Future Predictions
0 2015-01-02 308.519989 NaN NaN
1 2015-01-05 302.190002 NaN NaN
2 2015-01-06 295.290009 NaN NaN
3 2015-01-07 298.420013 NaN NaN
4 2015-01-08 300.459991 NaN NaN
... ... ... ... ...
1544 2021-01-25 NaN NaN 3115.769043
1545 2021-01-26 NaN NaN 3113.068848
1546 2021-01-27 NaN NaN 3110.791016
1547 2021-01-28 NaN NaN 3109.411865
1548 2021-01-29 NaN NaN 3109.274902

1549 rows × 4 columns

Model Evaluation with RMSE

In [54]:
rmse= np.sqrt(np.mean(PREDICTION_TRAIN[cyw] - date_col[cyw])**2)
rmse
Out[54]:
4.214139747772095

Visualize the Predictions vs Actual prices

In [55]:
fig5 = px.line(fin_mod2, x = 'Date', y = ['Actual Price', 'Training Predictions', 'Future Predictions'],
              title='Model', 
              template = 'presentation')

fig5.update_xaxes(rangeslider_visible = True, rangeselector = dict(buttons=list([
            dict(count=1, label="1m", step="month", stepmode="backward"),
            dict(count=6, label="6m", step="month", stepmode="backward"),
            dict(count=1, label="YTD", step="year", stepmode="todate"),
            dict(count=1, label="1y", step="year", stepmode="backward"),
            dict(step="all")])))

fig5.show()